### Advanced topics

- Forwarding unit design, Branch Predictors
- Superscalar processors/Dynamic scheduling -Algorithms for Out-of-Order (OOO) Execution
- Hardware of OOO Re-order buffer and Reservation stations
- Multi-core processors, multiple issue processors
- Multithreading
- Parallelism in instructions: SIMD, VLIW
- Interrupts/Exceptions/I/O

- Virtual memory, Page tables, translation look aside buffer (TLB)
- Multi-core processors and cache hierarchy
- Cache coherence protocols in multi-core processors, consistency models
- Victim cache, banked caches
- Non-uniform Memory/Cache architectures
- Cache compression and compaction
- Prefetching using buffers
- Cache side channel attacks, security
- GPU/TPU architectures

Programming languages – parallelism

Algorithms

Compiler optimizations

Memory management

Simulation models

Architecture specifications

Circuit design

Interconnect design

Power optimizations

Memory design

Device technology

ISA



Exception - any unexpected change in control flow caused by Exception handling Roudine internal events

Interrupt- triggered by an external event, can be asynchronous < ISR > Th

## Exceptions

- Control hardest to design. Control hazard tough to resolve
- Exception and interrupts Another form of control hazard
- Exception any unexpected change in control flow caused by internal events
- Interrupt- triggered by an external event, can be asynchronous

| Type of event                                 | From where? | MIPS terminology       |
|-----------------------------------------------|-------------|------------------------|
| I/O device request                            | External    | Interrupt V            |
| Invoke the operating system from user program | Internal /  | Exception              |
| Arithmetic overflow 🗸                         | Internal    | Exception /            |
| Using an undefined instruction $\nearrow$     | Internal    | Exception              |
| Hardware malfunctions <                       | Either      | Exception or interrupt |





Invalid \$1 causes error in the following instructions

## Exceptions at different stages

IF:

fetching instruction from memory causes page fault Misaligned memory access

ID: illegal opcode

EX: division by 0, overflow

MEM: fetching data causes page fault; illegal address

WB: no exceptions

## Exception handling

### The Hardware

- Excepting instanction The pipeline has to stop executing the offending instruction in midstream,
- let all preceding instructions complete,
- flush all succeeding instructions,
- set a register to show the cause of the exception,
- pcsave the address of the offending instruction, and
  - \_\_then jump to a predefined address/vectored interrupt (address of the exception handler code)

#### In The Software

The software (OS) looks at the cause of the exception and deals with it.

- OS kills the program or resumes the instruction  $\longleftarrow$ 
  - Depends on processor implementation and ISA



- Save the address of the offending instruction
- Save any other information needed to return back

### MIPS support

EPC register = exception program counter – 32 bit contains address of instruction that caused the exception We need to record what caused an exception

- 1. Cause register = 32-bit status register used to record cause of exception. (some bits used)
- 2. Vectored interrupt: OS can determine the cause based on the address

| Number | Name    | Description                                 |
|--------|---------|---------------------------------------------|
| 00     | INT     | External Interrupt                          |
| 01     | IBUS    | Instruction bus error (invalid instruction) |
| 10     | OVF     | Arithmetic overflow                         |
| 11     | SYSCALL | System call                                 |

### Co-processor for MIPS

- Contains registers useful for handling exceptions
- Not accessible in user mode. Available only in Kernel mode
- Includes the status register, cause register, BadVaddr, and EPC (Exception Program Counter).

| Register<br>name | Register<br>number | Usage                                                              |
|------------------|--------------------|--------------------------------------------------------------------|
| BadVAddr         | 8                  | memory address at which an offending memory reference occurred     |
| Count            | 9                  | timer                                                              |
| Compare          | 11                 | value compared against timer that causes interrupt when they match |
| Status           | 12                 | interrupt mask and enable bits                                     |
| Cause            | 13                 | exception type and pending interrupt bits                          |
| EPC              | 14                 | address of instruction that caused exception                       |
| Config           | 16                 | configuration of machine                                           |

## MIPS support

- Control signals to write EPC, Cause, and any other Status registers
- Write exception address into EPC, increase PC mux input lines to set exception address (MIPS uses  $8000000180_{hex}$ ).
- Undo PC = PC + 4, since want EPC to point to offending instruction (not PC+4)
  - -So, do PC = PC 4
- Flush all succeeding instructions

# MIPS support – for vectored interrupts

| Exception type                                           | Exception vector address (in hex)  |
|----------------------------------------------------------|------------------------------------|
| Undefined instruction <                                  | 8000 0000 <sub>hex</sub> ← ESR     |
| Arithmetic overflow                                      | 8000 0180 <sub>hex</sub>           |
| 80000180 <sub>hex</sub> sw<br>80000184 <sub>hex</sub> sw | \$26, 1000(\$0)<br>\$27, 1004(\$0) |

Iw \$1, 4(\$3) add \$2, \$3, \$4 or \$3, \$1, \$2

add \$2, \$3, \$4 // causes overflow – when is it detected

ж

bne \$1, \$3, Loop



Computer organization and design- Henessey and Patterson

lw \$1, 4(\$3)

add \$2, \$3, \$4 // causes overflow – detect in EX stage

ж

or \$3, \$1, \$2



Computer organization and design- Henessey and Patterson

lw \$1, 4(\$3) add \$2, \$3, \$4 // causes overflow – detect in EX stage or \$3, \$1, \$2 ж bne \$1, \$3, L Nop Nop Nop LW ISR/ESR - SW instr Add ADD Shift left 2 Read Read register 1 data 1 Address Zero Read ALU register 2 Address result Read Instruction Registers data Write Read Data register Instruction data 2 memory mem ory Write data Write 16 32 Signextend Computer organization and design- Henessey and Patterson







## Precise and imprecise exceptions

Precise --> If the pipeline can be stopped so that

- instructions just before the faulting instruction are completed
- the faulting (and future) instruction can be restarted without altering the machine state

If it is an overflow --> restart from next instruction

Out of order completion or floating point pipelines where future instruction has already completed --> imprecise

CC1 CC<sub>2</sub> CC6 CC4 CC5 CC3 CC7 CC8 SLL R2, R2, 3 WB MEM IF ID EX EX MEM ₩B IF ID ADD R1, R2, R3 MEM WB IF ID EX\_ R4, 4(R20) SW IF ID EX MEM WB IF / AND R10, R2, R3

Suppose that LW has a misaligned address (not aligned on the word boundary)

When is it detected?

What should happen next?

Which pipeline registers to clear?

|                 | CC1 | CC2 | CC3 | CC4    | CC5   | CC6  | CC7 | CC8 |
|-----------------|-----|-----|-----|--------|-------|------|-----|-----|
| SLL R2, R2, 3   | IF  | ID  | EX  | MEM    | WB    |      |     |     |
| LW R4, 4(R5)    | 7   | IF  | ID  | 7 EX   | MEM   | WB 6 | - 1 |     |
| ADD R1, R2, R3  | ~   |     | IF  | ) ID   | EX    | MEM  | WB  |     |
| SW R4, 4(R20)   |     |     |     | ∍ IF   | ID    | EX   | MEM | WB  |
| AND R10, R2, R3 |     |     |     | )<br>) | IF    | ID   |     |     |
| SW.             |     |     |     | P (-6) | OC+4) |      |     |     |

Detected in EX stage – after computing the address? Detected in MEM – while accessing memory?

IF/ID, ID/EX, EX/MEM to be cleared

Save PC --> EPC
Assume AND is in IF and has not written PC to PC+4
What should be EPC?

|                 | CC1 | CC2 | CC3 | CC4 | CC5 | CC6 | CC7 | CC8 |
|-----------------|-----|-----|-----|-----|-----|-----|-----|-----|
| SLL R2, R2, 3   | IF  | ID  | EX  | MEM | WB  |     |     |     |
| LW R4, 4(R5)    |     | IF  | ID  | EX  | MEM | WB  |     |     |
| ADD R1, R2, R3  |     |     | IF  | ID  | EX  | MEM | WB  |     |
| SW R4, 4(R20)   |     |     |     | IF  | ID  | EX  | MEM | WB  |
| AND R10, R2, R3 |     |     |     |     | IF  | ID  |     |     |

AND is in IF and has not written PC to PC+4 PC is pointing to AND

Instruction causing exception is PC – C or PC-12 --> EPC Depends on the stage in which LW causes exception

Now make PC get the ISR address

What if ADD was BEQ?

|     | -               |                    |               |                          |                                                                     |                                                    |                                                   |
|-----|-----------------|--------------------|---------------|--------------------------|---------------------------------------------------------------------|----------------------------------------------------|---------------------------------------------------|
| CC1 | CC2             | CC3                | CC4           | CC5                      | CC6                                                                 | CC7                                                | CC8                                               |
| IF⊳ | ID              | EX                 | MEM           | WB                       |                                                                     |                                                    |                                                   |
| -3  | IF              | ID _               | EX            | MEM                      | WB                                                                  |                                                    |                                                   |
|     |                 | IF                 | (ID)          | EX                       | MEM                                                                 | WB                                                 |                                                   |
|     |                 |                    | IF            | ID                       | EX                                                                  | MEM                                                | WB                                                |
|     | IF <sub>□</sub> | IF <sub>□</sub> ID | IF ID EX ID - | IF ID EX MEM  → IF ID EX | IF ID EX MEM WB  → IF ID EX MEM  EX MEM  EX MEM  EX MEM  EX MEM  EX | IF ID EX MEM WB  ☐ IF ID EX MEM WB  ☐ IF ID EX MEM | IF ID EX MEM WB  IF ID EX MEM WB  IF ID EX MEM WB |

Multiple exceptions in the same clock cycle LW – misaligned memory access Next instruction is invalid opcode

|                |     |     |     |      |     | -   |     |     |
|----------------|-----|-----|-----|------|-----|-----|-----|-----|
|                | CC1 | CC2 | CC3 | CC4  | CC5 | CC6 | CC7 | CC8 |
| SLL R2, R2, 3  | ЦF  | ID  | EX  | MEM  | WB  |     |     |     |
| LW R4, 4(R5)   |     | IF  | ID  | EX   | MEM | WB  |     |     |
| Invalid opcode |     |     | IF  | (ID) | EX  | MEM | WB  |     |
|                |     |     |     | IF   | ID  | EX  | MEM | WB  |

Multiple exceptions in the same clock cycle

1<sup>st</sup> instruction takes precedence

**Complex** hardware



Out of order completion

Diverse pipeline

Multiple clock cycle execution

MUL --> overflow

|                   | CC1 | CC2 | CC3  | CC4  | CC5  | CC6  | CC7  | CC8 |
|-------------------|-----|-----|------|------|------|------|------|-----|
| MUL R1, R2, R4    | IF  | ID  | MUL1 | MUL2 | MUL3 | MUL4 | MUL5 | MEM |
| ADD R4, R5, R6    |     | IF  | ID   | EX   | MEM  | WB   |      |     |
| The second second |     |     | IF   | ID   | EX   | MEM  | WB   |     |
|                   |     |     |      | IF   | ID   | EX   | MEM  | WB  |

Add should not even have executed as per the exception rules!

Now, ADD has finished and exited the pipeline and also overwritten R4

Cannot even find out which value of R4 caused exception in MUL



Imprecise exception <



| Phi            | ) cc1 | CC2 | CC3  | CC4  | CC5  | CC6  | CC7            | CC8 |
|----------------|-------|-----|------|------|------|------|----------------|-----|
| MUL R1, R2 R4  | IF    | ID  | MUL1 | MUL2 | MUL3 | MUL4 | MUL5           | MEM |
| ADD R4, R5, R6 |       | IF  | ID   | EX   | MEM  | WB   | ^              |     |
|                |       |     | IF   | ID   | EX   | MEM  | ₩ <del>B</del> |     |
|                |       |     |      | IF   | ID   | EX   | MEM            | ₩B  |

Checkpoint, Rollbach

How can we solve this?

Need to roll back architectural status or machine state to prior to MUL and restore R4

Flush MUL and ADD



ADD instruction page fault occurs before (in time) the LW page fault.

We must finish the LW before handling the ADD page fault (if we are implementing precise exceptions.)

We would then detect the LW's exception first and resolve it

|     | Clock Number |         |    |     |     |    |        |
|-----|--------------|---------|----|-----|-----|----|--------|
|     | 1            | 1 2 3 4 |    |     |     | 6  |        |
| LW  | IF           | ID      | EX | MEM | WB  |    | i<br>I |
| ADD | ļ            | IF      | ID | EX  | MEM | WB | <br> - |
|     |              |         |    |     |     |    |        |

Wait to handle an exception until a "last" point --> well defined point in the pipeline after which the machine state changes

--> such as write back or the end of the memory stage

Set an exception field in the pipeline stage and move ahead

Memory stage will look at this field to decide which instruction should be the precise exception point



Add --> Make a note of the IF exception, but dont resolve it until a certain point, until we are sure there are no previous exceptions



### MIPS support

- Additional instructions:
- mfc0 = instruction to put EPC into one of generalpurpose regs. E.g. mfc0 \$s1, \$epc

so that we can return from exception handler using jr.

syscall

Executes a system call. The system call number should be set in register \$v0

rfe- Return from exception.

## Example

- Assume \$1 overflows after add. User will never know what value of original \$1 caused the exception.
  - So, it is important to stop execution in the middle of the pipeline (EX) and prevent writeback
  - Introduce an EX. Flush
  - Save the offending instruction in EPC
  - Output from ALU --> should generate an overflow flag to the control unit which inturn generates the flush signals

### **Exception in a Pipelined Computer**

Given this instruction sequence,

Show what happens in the pipeline if an overfl ow exception occurs in the add instruction.



### Overflow detected in EX of clock 6. Causes Flush of ADD and



### Prior instructions complete. Future instructions flushed. Start from ISR



### Acknowledgements

- CS305 IIT Bombay Bhaskaran Raman
- CS152: Computer architecture: UCB http://www-inst.eecs.berkeley.edu/~cs152/sp12/lectures/L05-PipeliningII.pdf
- CMSC 611: UMN http://ece-research.unm.edu/jimp/611/slides/chap3\_ 5.html
- CSCE430/830 Univ of Maine
- Computer organization and design- Henessey and Patterson